Text preprocessing is one of the key problems in pattern recognition and plays an important role in the process of text classification.\nText preprocessing has two pivotal steps: feature selection and feature weighting. The preprocessing results can directly affect the\nclassifiers� accuracy and performance. Therefore, choosing the appropriate algorithm for feature selection and feature weighting\nto preprocess the document can greatly improve the performance of classifiers. According to the Gini Index theory, this paper\nproposes an Improved Gini Index algorithm. This algorithm constructs a new feature selection and feature weighting function. The\nexperimental results show that this algorithm can improve the classifiers� performance effectively. At the same time, this algorithm\nis applied to a sensitive information identification system and has achieved a good result. The algorithm�s precision and recall are\nhigher than those of traditional ones. It can identify sensitive information on the Internet effectively.
Loading....